Skip to content

Add micro batching and enpoints for v1 list_models and get_model#41

Merged
baixiac merged 3 commits intomainfrom
llm-gen2
Mar 13, 2026
Merged

Add micro batching and enpoints for v1 list_models and get_model#41
baixiac merged 3 commits intomainfrom
llm-gen2

Conversation

@baixiac
Copy link
Member

@baixiac baixiac commented Mar 6, 2026

feat: add enpoints for v1 models and list models
feat: add micro batching and lower CPU usage during model loading
feat: ensure the pad token for generative models
feat: use the async streamer during async generation
feat: apply timeout to text generation
fix: fix the property name for stop sequences in OpenAI requests
docker: add the GPU image build and remove per-model
chore: upgrade uv and tidy up the docker folder

feat: add micro batching and lower CPU usage during model loading
feat: ensure the pad token for generative models
feat: use the async streamer during async generation
feat: apply timeout to text generation
fix: fix the property name for stop sequences in OpenAI requests
@baixiac baixiac force-pushed the llm-gen2 branch 2 times, most recently from 78b021e to 7f7c36f Compare March 10, 2026 10:48
@baixiac baixiac merged commit 98554f8 into main Mar 13, 2026
7 checks passed
@baixiac baixiac deleted the llm-gen2 branch March 13, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant